The tidyverse is an opinionated collection of R packages designed for data science. Make sure that you have the tidyverse installed by typing install.packages("tidyverse") into the Console in the bottom left corner. You only need to do this once.
For the next little while we willwork with “tibbles” instead of R’s traditional data.frame. Tibbles are data frames, but they tweak some older behaviours to make life a little easier.
#as.tibble()
There are two main differences in the usage of a tibble vs. a classic data.frame: printing and subsetting.
$ and [[df <- tibble(
x = runif(5),
y = rnorm(5)
)
# Extract by name
df$x
## [1] 0.4503653 0.3935854 0.7458050 0.8709692 0.3113979
df[["x"]]
## [1] 0.4503653 0.3935854 0.7458050 0.8709692 0.3113979
# Extract by position
df[[1]]
## [1] 0.4503653 0.3935854 0.7458050 0.8709692 0.3113979
DplyrChoosing Columns with select()
Renaming variables with rename()
Sorting and Reordering with arrange()
Subsetting and Filtering Data with filter()
Adding new columns using dplyr’s mutate():
The pipe, %>%, comes from the magrittr package by Stefan Milton Bache. Packages in the tidyverse load %>% for you automatically, so you don’t usually load magrittr explicitly.
The point of the pipe is to help you write code in a way that is easier to read and understand.
tidyrThe goal of tidyr is to help you create tidy data. Tidy data is data where:
Reshaping Data (Wide/Long)
Wide Data Long Data
There are two sets of methods that are explained below:
gather() and spread() from the tidyr package. This is a newer interface to the reshape2 package.melt() and dcast() from the reshape2 package.